Effect Size Estimation in Neuroimaging

ثبت نشده
چکیده

A central goal of translational neuroimaging is to establish robust links between brain measures and clinical outcomes. Success hinges on the development of brain biomarkers with large effect sizes. With large enough effects, a measure may be diagnostic of outcomes at the individual patient level. Surprisingly, however, standard brain-mapping analyses are not designed to estimate or optimize the effect sizes of brain-outcome relationships, and estimates are often biased. Here, we review these issues and how to estimate effect sizes in neuroimaging research. Effect size is a unit-free description of the strength of an effect, independent of sample size. Examples include Cohen d, Pearson r, and number needed to treat.1,2 For a given sample size (N), these can be converted to a t or z score (eg, Cohen d is t/[N]1/2). But t, z, F, and P values are sample size dependent and relate to the presence of an effect (statistical significance), not its magnitude. By contrast, effect size describes a finding’s practical significance, which determines its clinical importance. This is an important distinction because small effects can reach statistical significance given a large enough sample, even if they are unlikely to be of practical importance or replicable across diverse samples.3 Traditional neuroimaging studies are not designed to estimate effect sizes. A typical analysis tests for effects at each of 50 000 to 350 000 brain voxels. Post hoc effect sizes are selectively reported for a small subset of significant voxels. This practice creates bias, making effect size estimates larger than their true values.4 It is like a mediocre golfer who plays 5000 holes over the course of his career but only reports his 10 best holes. Bias is introduced because the best performance, selected post hoc, is not representative of expected performance. The Figure shows a simulation in which the true effect size in a set of voxels is d = 0.5. Once noise is added and a statistical test (t test) is conducted across 30 individuals, all significant voxels have an estimated effect size greater than the true effect. Why does this occur? Voxels tend to be significant if they show a true effect and have noise that favors the hypothesis. Correcting formultiple comparisons reduces false positives but actually increases this optimistic bias.6 As statistical thresholds become more stringent, an increasingly small subset of tests with favorable noise will reach significance, making the estimated post hoc effect size grow. In sum, conducting a large number of tests inherently induces selection bias, which invalidates effect size estimates. To overcome selection bias, we must reduce the numberof statistical testsperformed.Onesolution is to test a single, predefined region of interest. However, it is rare toconsideronly 1 regionanddiscardvaluabledata. In addition, many symptoms and outcomes of interest are increasingly thought to be distributed across brain networks.5 Itcanalsobetemptingtoredefinetheboundaries of regions of interest post hoc after looking at the results—a form of P hacking that invalidates both hypothesis tests and effect size estimates. Analternativeapproach is to integrateeffectsacross multiple voxels into 1 model of the outcome, which is then tested on newobservations (ie, newpatients). Insteadof testingeachvoxel separately, associationswith clinical outcomesare combined intoa singlemodel, and a single prediction is made for each patient. This approach is common inclinical research; for example,multiple factors, likediet, exercise, andhormone levels, are combined into models of disease risk. Neuroimaging models arebasedonvoxelsornetworkmeasures rather than risk factors, but the principle is the same. As long as (1) the model makes a single prediction for each patient and (2) predictions are tested on patient samples independentof thoseused toderive themodel, theneffect size estimates are unbiased. A growing number of studies use machine learning and multivoxel pattern analysis to integrate brain information into predictive models. Effect sizes are assessed via prospective application of the model to new, “out-of-training-sample” patients, often using an iterative strategy of training and testing on different subsets of patients, known as cross-validation7 (see Chang et al,5 for example). There are ways that crossvalidation can fail, and it is possible to overfit a crossvalidated data set by trainingmanymodels and picking the best. However, if a model is tested prospectively on new, independent data sets without changing its parameters, then unbiased estimates of effect sizes can be obtained. Bias, or lack thereof, can also be assessed with permutation tests. Because integrated models combine information distributed across the brain in an optimized way, these models can substantially outperform single regions in predicting outcomes (Figure, C [adapted from data in Chang et al5]). Thus, such models provide a promising way toestablishmeaningful associationsbetweenbrain measures and clinically relevant outcomes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Estimation Error on Risk-adjusted Bernoulli GEWMA Control Chart in Multistage Healthcare Processes

Background and objectives: Risk-adjusted Bernoulli control chart is one of the main tools for monitoring multistage healthcare processes to achieve higher performance and effectiveness in healthcare settings. Using parameter estimates can lead to significantly deteriorate chart performance. However, so far, the effect of estimation error on this chart in which healthcare ...

متن کامل

Is meditation associated with altered brain structure? A systematic review and meta-analysis of morphometric neuroimaging in meditation practitioners.

Numerous studies have begun to address how the brain's gray and white matter may be shaped by meditation. This research is yet to be integrated, however, and two fundamental questions remain: Is meditation associated with altered brain structure? If so, what is the magnitude of these differences? To address these questions, we reviewed and meta-analyzed 123 brain morphology differences from 21 ...

متن کامل

Sample size estimation in epidemiologic studies

This review basically provided a conceptual framework for sample size calculation in epidemiologic studies with various designs and outcomes. The formula requirement of sample size was drawn based on statistical principles for both descriptive and comparative studies. The required sample size was estimated and presented graphically with different effect sizes and power of statistical test at 95...

متن کامل

Effect of Phantom Size and Tube Voltage on the Size-Conversion Factor for Patient Dose Estimation in Computed Tomography Examinations

Introduction: This study aimed to establish the conversion factors to normalize the output dose of volumetric computed tomography dose index (CTDIvol) to the patient dose (i.e. size-specific dose estimate (SSDE)) for various phantom diameters and tube voltages. Material and Methods: In-house cylindrical acrylic phantoms with physical diameter...

متن کامل

Power for linear models of longitudinal data with applications to Alzheimer’s Disease Phase II study design

We will discuss power and sample size estimation for randomized placebo controlled studies in which the primary inference is based on the interaction of treatment and time in a linear mixed effects model (Laird and Ware, 1982). We will demonstrate how the sample size formulas of (Liu and Liang, 1997) for marginal or model fit by generalized estimating equation (GEE) (Zeger and Liang, 1986) can ...

متن کامل

STORE: Sparse Tensor Response Regression and Neuroimaging Analysis

Motivated by applications in neuroimaging analysis, we propose a new regression model, Sparse TensOr REsponse regression (STORE), with a tensor response and a vector predictor. STORE embeds two key sparse structures: element-wise sparsity and low-rankness. It can handle both a non-symmetric and a symmetric tensor response, and thus is applicable to both structural and functional neuroimaging da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016